SimpleETL: ETL Processing by Simple Specifications

نویسندگان

  • Ove Andersen
  • Christian Thomsen
  • Kristian Torp
چکیده

Massive quantities of data are today collected from many sources. However, it is often labor-intensive to handle and integrate these data sources into a data warehouse. Further, the complexity is increased when specific requirements exist. One such new requirement, is the right to be forgottenwhere an organization upon request must delete all data about an individual. Another requirement is when facts are updated retrospectively. In this paper, we present the general framework SimpleETL which is currently used for Extract-Transform-Load (ETL) processing in a company with such requirements. SimpleETL automatically handles all database interactions such as creating fact tables, dimensions, and foreign keys. The framework also has features for handling version management of facts and implements four different methods for handling deleted facts. The framework enables, e.g., data scientists, to program complete and complex ETL solutions very efficiently with only few lines of code, which is demonstrated with a real-world example.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UnifiedViews: Towards ETL Tool for Simple yet Powerfull RDF Data Management

We present UnifiedViews, an Extract-Transform-Load (ETL) framework that allows users to define, execute, monitor, debug, schedule, and share ETL data processing tasks, which may employ custom plugins (data processing units, DPUs) created by users. UnifiedViews differs from other ETL frameworks by natively supporting RDF data and ontologies. In this paper, we introduce UnifiedViews and discuss f...

متن کامل

Requirements Analysis Method For Extracting-Transformation-Loading (Etl) In Data Warehouse Systems

The data warehouse (DW) system design involves several tasks such as defining the DW schemas and the ETL processes specifications, and these have been extensively studied and practiced for many years. The problems in heterogeneous data integration are still far from being resolved due to the complexity of ETL processes and the fundamental problems of data conflicts in information sharing enviro...

متن کامل

Entropy Guided Transformation Learning

This work presents Entropy Guided Transformation Learning (ETL), a new machine learning algorithm for classification tasks. It generalizes Transformation Based Learning (TBL) by automatically solving the TBL bottleneck: the construction of good template sets. We also present ETL Committee, an ensemble method that uses ETL as the base learner. The main advantage of ETL is its easy applicability ...

متن کامل

Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading

BACKGROUND Electronic health records (EHRs) contain detailed clinical data stored in proprietary formats with non-standard codes and structures. Participating in multi-site clinical research networks requires EHR data to be restructured and transformed into a common format and standard terminologies, and optimally linked to other data sources. The expertise and scalable solutions needed to tran...

متن کامل

Rameps: a Goal-ontology Approach to Analyse the Requirements for Data Warehouse Systems

The data warehouse (DW) systems design involves several tasks such as defining the DW schemas and the ETL processes specifications, and these have been extensively studied and practiced for many years. However, the problems in heterogeneous data integration are still far from being resolved due to the complexity of ETL processes and the fundamental problems of data conflicts in information shar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018